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Abstract. Recognizable languages of finite words are part of every com- 
puter science cursus, and they are routinely described as a cornerstone 
for applications and for theory. We would like to briefly explore why that 
is, and how this word-related notion extends to more complex models, 
such as those developed for modeling distributed or timed behaviors. 



'Ev apxfj ^ o Xoyoc. . . 

In the beginning was the Word. . . 

Recognizable languages of finite words are part of every computer science 
cursus, and they are routinely described as a cornerstone for applications and 
for theory. We would like to briefly explore why that is, and how this word- 
related notion extends to more complex models, such as those developed for 
modeling distributed or timed behaviors. 

The notion of recognizable languages is a familiar one, associated with classi- 
cal theorems by Kleene, Myhill, Nerode, Elgot, Biichi, Schiitzenberger, etc. It can 
be approached from several angles: recognizability by automata, recognizability 
by finite monoids or finite-index congruences, rational expressions, monadic sec- 
ond order definability. These concepts are expressively equivalent, and this leads 
to a great many fundamental algorithms in the fields of compilation, text pro- 
cessing, software engineering, etc. . . Moreover, it surely indicates that the class 
of recognizable languages is central. These equivalence results use the specific 
structure of words (finite chains, labeled by the letters of the alphabet), and the 
monoid structure of the set of all words. 

Since the beginnings of language theory, there has been an interest for other 
models than words - especially for the purpose of modeling distributed or timed 
computation (trees, traces, pomsets, graphs, timed words, etc) and for ex- 
tending to these models the tools that were developped for words. For many 
models, some of these tools may not be defined, and those who are defined, may 
not coincide. 

In this paper, we concentrate on the algebraic notion of recognizability: that 
which, for finite words, exploits the monoid structure of the set of words, and 
relies on the consideration of monoid morphisms into finite monoids, or equiv- 
alently, of finite-index monoid congruences. Our aim is to examine why this 
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particular approach is fruitful in the finite word case, and how it has been, or 
can be adapted to other models. 

In Sect, n we explore the specific benefits of using algebraic recognizability 
for the study of word languages. It opens the door to a very fine classification of 
recognizable languages, which uses the resources of the structural theory of finite 
monoids. This classification of recognizable languages is not only mathematically 
elegant, it also allows the characterization and the decision of membership in 
otherwise significant classes of languages. 

The emblematic example of such a class is that of star-free languages, which 
are exactly the first-order definable languages, those that can be defined by a 
formula in propositional temporal logic, those that are recognized by a counter- 
free automaton, and those whose syntactic monoid is aperiodic (Schiitzenberger, 
McNaughton, Pappert, Kamp). Only the latter two characterizations lead to a 
decision algorithm, and the algebraic approach makes this algorithm the clearest. 

The example of star-free languages is however not the only one; for exam- 
ple, the various notions of locally testable languages are also characterized, and 
ultimately efficiently decided, by algebraic properties (Simon, McNaughton, Lad- 
ner). The power of finite monoid theory leads in fact to an extremely fine clas- 
sification (Eilenberg), where for instance, natural hierarchies within first-order 
or temporal logic can be characterized as well (Brzozowski, Knast, Thomas, 
Therien, Wilke). 

Already in the 1960s, fundamental results appeared on notions of automata 
for trees and for infinite words, linking them with logical definability and rational 
expressions (Biichi, Doner, Mezei, Thatcher, Wright). An algebraic approach to 
automata-recognizable languages of infinite word was introduced in the early 
1990s, and as in the case of tree languages, it requires introducing an algebraic 
framework different from monoid theory (Perrin, Pin, Wilke), see Sect. 14. ll 

In fact, very early on, Elgot and Mezei extended the notion of recognizability 
to subsets of arbitrary (abstract) algebras. But the notion of logical definability 
for these subsets strongly depends on the combinatorial (relational) structure 
of the objects chosen to represent the elements of the abstract algebra under 
consideration. In many situations, the problem is posed in the other direction: 
we know which models we want to consider (they are posets, or graphs, or traces, 
or timed words, as arise from, say, the consideration of distributed or timed 
computation) and we need to identify an algebraic structure on the set of these 
objects, for which logical definability and algebraic recognizability will be best 
related. One key objective there, is to be able to decide logical specifications. 
Note that models of automata, while highly desirable, are not known to exist in 
all the interesting cases, and especially not for graphs or posets. In contrast, the 
algebraic and the logical points of view are universal. 

A class of relational structures being fixed, Courcelle gave very loose condi- 
tions on an algebraic structure on the set of these structures, which guarantee 
that counting monadic second order definability implies recognizability. The con- 
verse is known to hold in a number of significant cases, but not in general. 



In Sect. 01 we discuss some of the relational structures that have been studied 
in the literature, and the definability vs. recognizability results that are known 
for them: trees, infinite words, traces, series-parallel pomsets, message sequence 
charts and layered diagrams, graphs, etc. The main obstacle to the equivalence 
between definability and recognizability is the fact that the algebras we consider 
may not be finitely generated. In contrast, a good number of situations have 
been identified where this equivalence holds, and each time, a finiteness (or 
boundedness) condition is satisfied. We will try to systematically point out these 
finiteness conditions, and we will identify some important questions that are still 
pending. 

When all is said and done, the central question is that of the specification and 
the analysis of infinite sets by finite means (finite and finitely generated algebras, 
finite automata, logical formulas, etc.), arguably the fundamental challenge of 
theoretical computer science. This paper presents a personal view on the rel- 
evance of algebraic recognizability for this purpose, beyond its original scope 
of application (languages of finite words), and an introduction to some of the 
literature and results that illustrate this view. I do not claim however that it 
constitutes a comprehensive survey of the said literature and results (in partic- 
ular, I chose to systematically refer to books and survey papers when available), 
and I apologize in advance for any omission!. . . 

1 The Finite Word Case 

In the beginning were the (finite) words, and loosely following the Biblical anal- 
ogy, one could say that the spirits of Kleene, Biichi and Schiitzenberger fiew over 
the abyss, organising it from chaos to beauty. 

Throughout this paper, A will denote an alphabet, i.e. a finite, non-empty 
set. We denote by A* the set of all finite words on alphabet A. 

1.1 The Classical Equivalence Results 

We are all familiar with the notion of regular languages, but there are in fact 
several competing notions, that turn out to be equivalent for finite words. Each 
is interesting in its own right, as it reveals a fruitful point of view, syntactic or 
semantic, denotational or operational. The results of this section can be found in 
many books, and in particular in those of Eilenberg jSH], Pin |fi7lf)8l69j . Straubing 
ffZ) . Sipser ^ and Sakarovitch [7Hj . 

Recognizability by Automata. One can first consider languages recognized 
by finite state automata, whether deterministic or non-deterministic. Every lan- 
guage recognized by a finite state automaton admits a unique minimal deter- 
ministic automaton, which is effectively computable. 

The notion of a deterministic automaton can also be expressed in terms of a 
finite-index semi-congruence, and in terms of an action of the free monoid on a 
finite set. 



Algebraic Recognizability. One can also consider languages recognized by a 
finite monoid. This exploits the monoid structure of A* , the set of all words on 
alphabet A: if M is a monoid and ip: A* — > M is a morphism of monoids, we 
say that L C A* is recognized by (p (or by M) if L = Pip~^ for some P C M, 
or equivalently if L = Lipip^^. Here too, for every language L recognized by a 
finite monoid, there exists a least finite monoid recognizing L, called the syntactic 
monoid of L. 

Rational Expressions. Rational expressions describe languages using the let- 
ters of the alphabet, the constant 0, and the so-called rational operations of 
union, concatenation and Kleene star (if L is a language, L* is the submonoid 
of A* generated by L). 

It should be noted that for a given rational language, there is no notion of a 
unique minimal rational expression describing it. 

Logical Definability. Biichi's sequential calculus exploits the combinatorial 
structure of words, as A-labeled, linearly ordered finite sets: in this logical for- 
malism, individual variables are interpreted to be positions in a word, and the 
predicates are i < j (to say that position i is to the left of position j) and Rai 
(to say that position i is labeled by letter a G A). First order formulas (FO) use 
only individual variables, whereas monadic second order formulas (MSO) also 
use second order variables, interpreted to be sets of positions. To a formula ip in 
this language, one associates the set L{(p) of all words which satisfy (p, that is 
L{ip) is the language of the finite models of ip. 

The Kleene-Nerode-Myhill-Biichi Theorem. Theorems by Kleene, Ner- 
ode, Myhill and Biichi show that the notions rapidly described above coincide. 

Theorem 1.1. Let L C_ A* . Then L is recognized by a finite state automaton, if 
and only if L is recognized by a finite monoid, if and only if the syntactic monoid 
of L is finite, if and only if L is described by a rational expression, if and only 
if L is defined by an MSO-formula of Biichi's calculus. 

Moreover, there are algorithms to pass from one of these specification for- 
malisms to each other. 

It is interesting to note that the many closure properties of the class of 
recognizable languages are easily established in an appropriate choice of one 
of these equivalent formalisms. For instance, closure under Boolean operations 
easily follows from the definition of algebraic recognizability, as does closure 
under inverse morphism (inverse rewriting). On the other hand, closure under 
concatenation, star and direct morphism is a triviality for languages described 
by rational expressions. 

None of the equivalences in Theorem 11.11 is very difficult, but their proofs 
really use the different points of view on words and languages. If we compare 
these results with the situation that prevails for other models than words, it is 



in fact a very exceptional situation to have these notions be so nicely defined 
and be equivalent. 

1.2 Classification of Recognizable Languages 

With each recognizable language L, we can associate a computable canonical 
finite object - in fact two closely related such objects: the minimal automaton of 
L, and its syntactic monoid. The connection between them is tight: the syntactic 
monoid of L is exactly the monoid of transitions of its minimal automaton. 

This paves the way for a fine classification of recognizable languages (see 
| 67l69p . Not surprisingly, it is the syntactic monoid, with its natural algebraic 
structure, which offers the strongest classification tool. In this section, we give 
a few instances of this classification: some are well-known and open up impor- 
tant applications (star- free languages, locally testable languages), some are more 
specific, and demonstrate the degree of refinement allowed by this method. 

Star- Free Languages. The most illuminating example of a significant subclass 
of recognizable languages is given by the star-free languages. These are the lan- 
guages which can be described by star-free expressions, i.e., using the letters of 
the alphabets, the constants and 1 (the empty word), the Boolean operations 
and concatenation (but no star). 

The characterization of star-free languages requires the following definitions: 
a deterministic automaton is said to be counter-free if whenever a non-trivial 
power w" (m ^ 1, n 7^ 0) labels a loop at some state q, then the word u also 
labels a loop at the same state. A finite monoid M is said to be aperiodic if 
it contains no non-trivial group, if and only if for each x £ M, = x"'^^ for 
all large enough n. Finally, PTL {propositional temporal logic) is a modal logic, 
interpreted on positions in words, with modalities next, eventually and until. 

The following statement combines results by Schiitzenberger, McNaughton, 
Pappert and Kamp, see |33I49I67I77| . 

Theorem 1.2. Let L C_ A* . Then L is star-free, if and only if L is recognized 
by a finite aperiodic monoid, if and only if the syntactic monoid of L is finite 
and aperiodic, if and only if L is recognized by a counter-free automaton, if and 
only if L is defined by an FO-formula of Biichi 's calculus, if and only if L is 
defined by a PTL-formula. 

Moreover, there are algorithms to pass from one of these specification for- 
malisms to each other. 

Thus, the class of star-free languages, with its natural definition in terms 
of generalized rational expressions, ends up having natural characterizations in 
terms of all the formalisms used in Theorem ll.il - to which we can add PTL, a 
logical formalism considered to be very useful to specify the behavior of complex 
systems. 

The historically first side of this result is the algebraic one, which links star- 
free languages and aperiodic monoids. It is of particular interest for two reasons. 



First because it offers an algorithm to decide whether a language is star-free; and 
second, because it shows that the algebraic structure of the syntactic monoid 
of a recognizable language (not just its finitencss) reflects the combinatorial 
properties of that language. This gave the first hint of Eilenberg's theorem, 
discussed further in this section. 



Variants of FO-Definability. Several refinements and generalizations further 
reinforce the significance of Theorem II. 21 

Consider the extension FO+MOD of FO, where we also allow modulo quantifi- 
cation of the form 3 "^"'^ ''x (p{x) (q > 1). Such a quantification is interpreted to 
mean that the set of values x for which (p{x) holds, has cardinality a multiple of q. 
Straubing, Therien and Thomas showed that a language is FO+MOD-definable 
if and only if the subgroups of its syntactic monoid are solvable j77| . 

Considering now subclasses of FO, it turns out that every star- free language 
can be defined by a FO-formula using only 3 variables. Let FO2 be the class of 
star-free languages defined by FO-formulas with only 2 variables. Let also DA be 
the class of finite monoids in which every regular element is idempotent (if xyx = 
X for some y, then x'^ = x). Then a combination of results of Etessami, Pin, 
Schiitzenberger, Therien, Vardi, Weil, Wilke (74^70 37 80 shows that a language 
L is in FO2, if and only if it is defined by a IJ2- and by a 7T2-formula, if and only 
if its syntactic monoid is in DA, if and only if L is defined by a PTL formula 
which does not use the until modality, if and only if L can be obtained from the 
letters using only disjoint unions and unambiguous products (and the constants 
and A*). 

It is well-known that every FO-formula is equivalent to one in prenex normal 
form (consisting of a sequence of quantifications, followed by a quantifier-free 
formula). This gives rise to the classical quantifier- alternation hierarchy, based on 
counting the number of alternated blocks of existential and universal quantifiers. 
Another natural hierarchy, seen from the point of view of star-free expressions, 
defines its n-|-l-st level as the Boolean closure of products of level n languages 
(and level consists of and A*). This is the so-called dot-depth hierarchy. 
Thomas showed that these two hierarchies coincide, that is, a language can be 
defined by an FO-formula in prenex normal form with n alternating blocks of 
quantifiers, if and only if it is in the n-th level of the dot-depth hierarchy (83| . 
Decidable algebraic characterizations were given for level 1 of these hierarchies, 
but the decidability of level 2 and the further levels is still an open question. 
It was however showed (Brzozowski, Knast, Simon, Straubing, see [HHl) that 
the hierarchy is infinite (if \A\ > 2), and that each level is characterized by 
an algebraic property, in the following sense: if two languages have the same 
syntactic monoid and one is at level n, then so is the other one. 

There is also a natural hierarchy on PTL-formulas, based on the number of 
nested usage of the until modality. Therien and Wilke showed that the levels 
of this infinite hierarchy are characterized by the algebraic properties of the 
syntactic monoid, and that each is decidable [HI]- 



Communication complexity. The communication complexity of a language L 
is a measure of the amount of communication that is necessary for two partners, 
each holding part of a word, to determine whether the word lies in L, see j51| . 
Tesson and Therien showed that the communication complexity of a recognizable 
language is entirely determined by its syntactic monoid, and that it can be 
computed on this basis |78| . 

Piecewise and Locally Testable Languages. A word w is a subword of a 
word u if V ~ ai ■ ■ ■ an and u = uqUiUi ■ ■ ■ a^u^ for some Uqi G j4*. It is a 

factor of u if u = xvy for some x,y ^ A* . 

A language L is said to be n-piecewise testable if whenever u and v have the 
same subwords of length at most n and u ^ L, then v E L. The language L is 
piecewise testable if it is n-piecewise testable for some n. 

A language L is said to be n-locally testable if whenever u and v have the 
same factors of length at most n and the same prefix and suffix of length n — 1 , 
and u G L, then v G L. The language L is locally testable if it is n-locally testable 
for some n. Locally testable languages are widely used in the fields of learning 
and pattern matching, whereas piecewise testable languages form the first level 
of the dot-depth hierarchy. 

Results of Simon and McNaughton, Ladner (see |33I67I69| ) show that both 
these properties are characterized by algebraic properties of syntactic monoids. 
More precisely, a language L is piecewise testable if and only if every principal 
two-sided ideal of its syntactic monoid S{L), admits a single generator. The 
language L is locally testable if and only if S{L) is aperiodic and eS{L)e is an 
idempotent commutative monoid, for each idempotent e 7^ 1 in S{L). 

Varieties of Languages. Many more examples can be found in the literature, 
where natural algebraic properties of finite monoids match natural combinatorial 
or logical properties of languages (see for instance 1 53 67 69 ). The scope of this 
matching is described in Eilenberg's variety theorem; the latter identifies the 
closure properties on classes of recognizable languages and on classes of finite 
monoids, that characterize the classes that can occur in this correspondence. 
These classes are called, respectively, varieties of recognizable languages and 
pseudovarieties of finite monoids. 

Decision Procedures. The varieties of recognizable languages thus identified 
by algebraic means are all the more interesting if they are decidable. Since the 
syntactic monoid of a recognizable language is computable, this reduces to decid- 
ing the membership of a finite monoid in certain pseudovarieties. In fact, in the 
examples surveyed above, this is the only path known to a decision algorithm. 

Let us now assume that we are considering a decidable pseudovariety of 
monoids (and hence a decidable variety of languages) . The syntactic monoid of 
a language L, which is the transition monoid of the minimal automaton of L, 
may have a size exponential in the number of states of that automaton. Thus 



deciding whether a recognizable language given by a deterministic finite state 
automaton lies in a given variety, seems to require exponential time and space. 

In view of the connection between syntactic monoid and minimal automaton, 
it is possible to translate the relevant algebraic property of finite monoids to a 
property of automata, and to check this property on the minimal automaton. 
This possibility is explicitly stated in Theorem 1 1.21 but it is also the underlying 
reason for the decision procedures concerning piecewise testable (Stern (76) ) and 
locally testable languages (Kim, McNaughton, McCloskey |5Up. 

In many important situations, this leads to polynomial time membership al- 
gorithms: piecewise and locally testable languages, FO2, certain varieties related 
to the dot-depth hierarchy [70], etc. One major exception though, is the class 
of star-free languages, for which the membership problem is PSPACE-complete 
(Cho, Huynh 114)). In other words, given a deterministic automaton, there is no 
fundamentally better algorithm to decide whether the corresponding language 
is star-free, than to verify whether the syntactic monoid is aperiodic. 

It must be stressed that even in the cases where we have polynomial mem- 
bership algorithms, these algorithms are a translation to automata of algebraic 
properties of the syntactic monoid, they were not discovered until after the corre- 
sponding pseudovariety of monoids was identified, and their natural justification 
is via monoid-theoretic considerations. 

1.3 Recognizable and Context-Free Languages 

Recognizable languages form but the lowest level of the Chomsky hierarchy, 
where the next level consists of the context-free languages. Context-free languages 
are defined by context-free grammars, which can be viewed, with a more alge- 
braic mindset, as finite systems of polynomial equations of the form Xi =J2 p{x) 
(1 < i < n) where x = (xi, . . . is the vector of variables, the summations 
are finite and each p is a word over the letters of A and the variables (see |llp. 
A solution of such a system is a vector of languages L — (Li, . . . , L„), and the 
context-free languages arise as the components of maximal solutions of such sys- 
tems. Accordingly, context-free languages are also called equational, or algebraic. 

Recognizable languages are components of maximal solutions of certain sim- 
pler systems, where each p{x) is a word of the form XjU, 1 < j < n, u & A* 
(right-linear equation). In particular, not all context-free languages are recog- 
nizable. The class of context-free languages is not closed under intersection, but 
it is closed under intersection with recognizable languages. 

2 Almost as Established: the Finite Tree Case 

Tree languages were considered in the early 1960s, see Here we use a ranked 
alphabet, that is, a set equipped with an arity function cr: i7 — > IN. A Z'-term 
is defined recursively as follows: every letter of arity (a constant) is a Z'-term, 
and \i a & S has arity n and ti, . . . ,t„ are Z'-terms, then a{ti, . . . , t„) is a E- 
term. Terms are naturally (and unequivocally) represented by Z-labeled trees, 



where an a-labeled node has a {a) linearly ordered children. We let be the 
set of all Z'-terms. 

Thatcher and Wright introduced a model of automata for Z'-labeled trees, 
the so-called bottom-up automata [39179) . To describe their expressiveness, they 
used the natural algebraic structure on the set of Z'-terms: each element a ^ E 
is an operation, of arity cr{a), and no relation is assumed to hold between these 
operations. Now, let a Z'-algebra be any set S equipped with a cr(a)-ary operation 
a^' for each a G Z". If we use this algebraic notion, we can define recognizable 
and equational sets of Z-terms: a subset L C Tj; is said to be recognizable if 
there exists a morphism (of I7-algebras) (p from T^; to a finite Z-algebra, such 
that L = Lipip^^; and L is equational if it is a component of a vector of maximal 
solutions of a system of polynomial equations. These systems are defined as in 
Sect. 11.31 except that the parameters p are now taken to be terms rather than 
words. Note that again, given L C Ts, there exists a unique least Z-algebra 
recognizing it, called the syntactic S-algehra of L. 

Thatcher and Wright also described subsets of by generalized rational 
expressions^ involving the letters, unions, the Z-operations, and an appropriate 
notion of iteration. 

Finally, Doner considered a logical formalism to be applied to the trees repre- 
senting Z-terms the individual variables are interpreted as nodes in a finite 
tree and the predicates are interpreted to express the labeling function and the 
parent-child relation. 

Results of Doner, Thatcher and Wright | 29I39I79| prove the following state- 
ment. 

Theorem 2.1. Let L C T^. Then L is recognized by a bottom-up automaton, 
if and only if L is recognized by a finite S-algebra, if and only if the syntactic 
S-algebra of L is finite, if and only if L is described by a generalized rational 
expression, if and only if L is defined by an MSO formula, if and only if L is 
equational. 

Moreover, there are algorithms to pass from one of these specification for- 
malisms to each other. 

Note that the particularity of this setting is that equational sets are recog- 
nizable. Another important remark is that deterministic bottom-up automata 
are really Z-algebras, so the notions of automata-theoretic and algebraic rec- 
ognizability are not really distinct. This last point makes Theorem 12.11 a little 
less satisfying than its word counterpart. Another (subjective) cause of dissat- 
isfaction is that the generalized rational expression are rather awfully complex. 
Finally, this result has not made it easy to classify term languages in the spirit 
of Sect. 11.21 This is maybe due to a less long history of investigating the struc- 
tural properties of finite Z-algebras. Some interesting related results on binary 
trees, expressed in terms of certain context-free languages of words were proved 
by Beaudry, Lemieux, Therien j5l6l7l8| . Nevertheless, it is fair to say that no 
structural theory of Z-algebras clearly emerges. 



An open question which may serve as a benchmark in this direction is the 
following: given a recognizable tree language, can one decide whether it is FO- 
definable? 

3 The General Notion of Recognizability 

Adapting the discussion in Sect. [3 one can easily define recognizable and equa- 
tional subsets in any algebra. Recognizable sets are defined in terms of mor- 
phisms into finite algebras of the same type (or in terms of finite index congru- 
ences), and equational sets in terms of systems of polynomial equations (Mezei, 
Wright [Sn], Courcelle JT]). With those definitions, recognizable sets form a 
Boolean algebra, equational sets are closed under union, recognizable sets are 
always equational, finite sets are equational (even though they may fail to be rec- 
ognizable), products of equational sets (using the operations in the algebra under 
consideration) are equational, but the analogous statement for recognizable sets 
is not always true, and the intersection of a recognizable and an equational set 
is equational. Finally, if tp: 5 — > T is a morphism between algebras of the same 
type, then Lip^^ is recognizable if L C T is recognizable, and L(p is equational 
if L C S' is equational. 

3.1 Choosing an Algebraic Structure 

As discussed above, if the sets we consider are naturally contained in an algebra, 
the notion of recognizability is straightforward. Sometimes however (frequently 
maybe), we want to discuss sets of relational structures, and we then design an 
algebraic signature to combine these structures. 

For instance, it is one such abstract construction that has us see trees as 
terms. Consider even finite words: the interest of the model maybe lies simply 
in the notion of a totally ordered ^-labeled finite set. We chose to view the set 
of words as a monoid under concatenation, and this gave rise to the notion of 
algebraically recognizable languages discussed in Sect.Q We could also consider 
the following algebraic structure on the set of words: each letter a G A defines 
a unary operation u '—^ ua. Then the set of all finite words is the algebra gen- 
erated by A and the constant 1 (this amounts to considering the set of words 
as the algebra of (Au{l})-terms). One can verify that the notion of recogniz- 
able language is not modified - another sign of the robustness of the model of 
words. In fact, finite (v4u{l})-algebras are naturally identified with determinis- 
tic finite state automata, and the equivalence between this notion of algebraic 
recognizability and the monoid-based one is a rephrasing of Kleene's theorem. 

Relational structures, in the sense of this paper, are sets equipped with rela- 
tions from a given relational signature. For instance, as mentioned earlier, words 
are A-labeled totally ordered sets: the relational signature consists of the binary 
order relation and one labeling unary relation Ra for each letter a e A. In trees, 
the relations are the labeling relations and the parent-child binary relation (or 
the predecessor relation, or the parent and the sibling relations, etc, - these 



choices are equivalent when it conies to expressing properties in monadic second 
order logic, see Sect. 13. 3f) . 

The choice of an algebraic structure can be guided by the natural construc- 
tions generating the finite relational structures under consideration (concatena- 
tion of words; construction of terms; construction of a word letter by letter), but 
there is really nothing canonical or unique about the algebraic structure. 

Suppose for instance that we consider very few operations: then there are 
many more recognizable set, maybe to the extent that every set is recognizable 
(for exemple, consider the set of words with no operations at all: every finite 
partition, say L and L'^, is a finite index congruence). If on the other hand we 
have too many operations, then there will be less recognizable sets. For instance, 
let sh (for shift) be the unary operations on words that fixes 1, and maps ua 
io au [a A, u A*). One can verify that the set a*h is not recognizable for 
the algebra whose signature consists of the concatenation product and the shift 
operation. Another extreme example is given by IN, equipped with the constant 
and the unary predecessor and successor operations: then the only recognizable 
sets are and IN. 

It may also happen that adding certain operations does not change the class of 
recognizable sets. For instance, adding the mirror operation (defined inductively 
by 1 = 1 and ua — au) to the concatenation product, docs not alter the notion 
of recognizability. See also Sect. 14.31 

In Sect. 01 we discuss a number of relational structures for which very inter- 
esting notions of recognizability have emerged in the literature. 

3.2 Multi-Sorted Algebras, Ordered Algebras, etc 

Sometimes, algebras are too constrained: the domain and the range of certain 
natural operations may consist of certain kinds of elements only. This is taken 
care of by the definition of multi-sorted algebras, see |17I25| . 

A typical example is provided by the study of languages of infinite words 
(more details are given in Sect. I4.l|l . It turns out that the best algebraic frame- 
work consists of considering simultaneously the finite and infinite words. One 
relevant operation is the concatenation product: between two finite words, it is 
the usual, fundamental operation, yielding a finite word; the product uv where 
u is a finite word and v is infinite, is an infinite word; and while it is possible to 
define the product of two infinite words, the outcome of such a product carries no 
significant information [uv = u) and the operation can be discarded. So we find 
that we need to consider two sorts of elements, finite and infinite, and two binary 
product operations, of type finite x finite finite and finite x infinite infinite. 
We also need to consider the w-power, a unary operation of type finite infinite 
(since it turns a finite word into an infinite one). 

Another example is discussed in Sect. 14.31 where algebras with infinitely 
many sorts are considered. We do not want to give here a detailed discussion 
of congruences in multi-sorted algebra, only pointing out that such congruences 
can only identify elements of the same sort. If there are finitely many sorts. 



recognizability is defined by considering morphisms into finite algebras, or finite- 
index congruences. If tlie algebraic signature under consideration has infinitely 
many sorts, non-trivial algebras are usually not finite, and we consider locally 
finite algebras (in which each sort has a finite number of elements) and locally 
finite index congruences (with a finite number of classes in each sort). 

In the mid-1990s. Pin introduced the usage of ordered semigroups to refine 
the classification of recognizable languages [011 ■ The same idea can as naturally 
be used in any algebra (and has been for instance in |55|). but we will keep it 
outside the discussion in this paper to avoid increased complexity. 

3.3 Definability vs. Recognizability 

Based on the examples of words and trees, the natural language for logical defin- 
ability of recognizable sets would seem to be MSO, monadic second order logic. 
It is actually more natural to use CMSO, counting monadic second order logic. 
CMSO |16ll9j is monadic second order logic, enriched with the modulo quanti- 
fiers 3 '^x introduced in Sect. ^ In the case of words, CMSO is equivalent to 
MSO. In fact, this holds for any relational structure that comes equipped with 
a linear order, or for which a linear order can be defined by a MSO-formula 
{e.g. A-labeled trees as in Sect. 12 or traces as in Sect. I4.2|) . but it is not true in 
general. 

For instance, when discussing multisets (subsets of A with multiplicity), we 
can view them as ^-labeled finite discrete graphs (graphs without edges) . Then, 
MSO can only define finite and cofinite sets, and it is strictly weaker than CMSO. 
Note that, algebraically, the multisets on A under union, form the free commuta- 
tive monoid on A. The same monoid can be interpreted in terms of traces (with a 
commutative alphabet), its elements are then viewed as certain directed acyclic 
graphs with one connected component per letter (see Sect. 14.21 on traces), and 
MSO is equivalent to CMSO in this context. The recognizable subsets are the 
same in both interpretations, since their definition is given in terms of the same 
algebraic structure, that of the free commutative monoid over A, but recogniz- 
ability is equivalent to CMSO-definability in one interpretation, and to MSO- 
definability in the other. 

Say that a map ip: S ^ T between sets of relational structures is a MS- 
transduction if there exist MSO-formulas (in the language of the relational struc- 
tures in S) that express each s(p (its domain and its relations) as a subset of a 
direct product of a fixed number of copies of s (see Courcelle JHl for a precise 
definition). For instance, if Aq is the subset of constants in an alphabet A, the 
word in Aq formed by the leaves of an ^-labeled tree t can be easily described 
by MSO-formulas inside the set of nodes of t. 

Now consider a set of relational structures M, equipped with an algebraic 
structure with signature E. A simple example is given hy M — A* , the set of 
words on alphabet A, seen as a monoid: the signature U consists of a binary 
operation (interpreted in A* as concatenation) and oi \A\ constant symbols (in- 
terpreted in A* as the letters of ^4). The valuation morphism val maps every 



i7-term (a I7-labeled tree) to its interpretation in M. The following result is due 
to Courcelle fTMn) . 



Theorem 3.1. // the valuation morphism is surjective and is an MS-trans- 
duction, then every CM50- definable subset of M is recognizable. 

The mechanism of the proof is worth sketching: let L C M be CMSO- 
definable. The inverse image of a CMSO-definable set by an MS-transduction 
is CMSO-definable, so var^(L) is CMSO-definable in T^. But in the set of S- 
labeled trees, CMSO-definability is equivalent to MSO-definability, and hence to 
recognizability. And it is easy to show that if vaP^(L) is recognizable, then so 
is L. 

Examining this sketch of proof also sheds light on the decidability of CMSO- 
defined sets, and on the complexity of such a decision problem. Suppose we have 
a parsing algorithm, which maps a given relational structure a; G Af to a Z'-term 
parse(a;) describing it. Let L C A/ be described by a CMSO-formula tp and let 
X G M: we want to decide whether x G L. An MSO-formula iJj describing val~^ (L) 
can be computed from ip and the formulas describing the MS-transduction val. 
The problem then reduces to deciding whether parse(a;) satisfies i/;, and by Theo- 
rem l2.1l this can be solved (efficiently) by running parse(x) through a bottom-up 
tree automaton. 

The converse of Theorem 13.11 does not always hold: there are situations, in 
particular in the discussion of languages of graphs, where some recognizable sets 
are not CMSO-definable. However, the two notions are known to be equivalent 
in important cases: we have already seen it for words or trees; other interest- 
ing situations are discussed in Sect. 14.21 and IT^ It is interesting to note that 
a common feature of those situations where the notions of definability and rec- 
ognizability coincide, is that we are able to describe a parsing function parse as 
an MS-transductions, and this is possible only because some finite generation 
condition is assumed to hold (which cannot be assumed for the class of all finite 
graphs). 

For some of the specific relational structures discussed in the sequel, there 
is a notion of automaton that matches the definition of recognizability - but in 
many other situations, especially when dealing with graphs or posets, no such 
notion is known. In those cases, the algebraic approach is really the only tool 
we have to characterize logical definability, and to hope to bring about decision 
algorithms. 

4 Recognizable Sets of Discrete Structures 

For the discrete structures discussed in this section, fruitful algebraic struc- 
tures have been introduced in the literature. The first measure of the interest of 
such algebraic structures, is whether the corresponding notion of recognizability 
matches some natural notion of logical definability, or some natural notion of 
recognizability by automata. A second measure of interest is whether the alge- 
braic theory thus introduced allows us to characterize - and if possible decide 



- significant classes of recognizable sets. Typically, deciding FO-definability is 
a key problem, but other classes may arise naturally depending on the type of 
discrete structures we consider. 

4.1 Infinite Words 

We start with infinite words because it is an area where the theory has been 
developed for a long time (Biichi's theorem goes back to the early 1960s), and 
has a strong algebraic flavor. Here we are talking of one-way infinite words, 
or co-words, that is, A-labeled infinite chains, or elements of . For a detailed 
presentation of the results surveyed in this section, we refer the readers to Perrin 
and Pin's book and to the survey papers |65I68) . 

The notions of Buchi and (deterministic) Muller or Rabin automata were 
evolved in the 1960s, and they were proved to have the same expressive power 
as MSO-formulas (on A-labeled infinite chains), and as co-rational expressions. 
The latter describe every MSO-definablc language of w-words as finite unions 
of products of the form KL'^ , where K,L are recognizable languages of finite 
words and the w-power denotes infinite iteration. In particular, this indicates 
that the sets of w-words that can be accessed by MSO or automata-theoretic 
specifications are in a sense ultimately infinite iterations of a recognizable set of 
finite words. 

An algebraic approach to tj-rational languages took longer to evolve. Early 
work of Arnold, Pecuchet, Perrin emphasized the necessary interplay of rela- 
tions on (concerning infinite words) and ordinary monoid congruences on 
A* (concerning finite words). It also emphasized that nothing much could be ex- 
pected from the monoid structure of , in which every product is equal to its 
first factor. Eventually, it was recognized that finite and infinite words cannot 
be considered separately, but they form a two-sorted algebra, as explained in 
Sect.ES The definition of the binary concatenation product does not pose any 
problem, but must be split in one operation of type finite^ — > finite and one oper- 
ation of type finite x infinite infinite. But the generation of infinite words from 
finite one can be envisaged in two fashions: we can consider an w-ary product, of 
type finite'^ infinite, or the unary w-power operation, of type finite infinite. 
The first choice is termed an co-semigroup (Perrin, Pin), and A°° — U A^ is 
(freely) generated by A as an w-semigroup. The second choice is termed a Wilke 
algebra (Wilke), and the sub- Wilke algebra of A°° generated by A consists in 
the finite and ultimately periodic w-words only. A Ramsey theorem shows how- 
ever that on a finite set, a Wilke algebra structure can be canonically extended 
to an w-semigroup structure, so that the consideration of these two algebraic 
structures yields the same class of recognizable languages. 

The robustness of this algebraic approach to recognizable subsets of A°° 
(and not A"^!) is such that an Eilenberg-style theory of varieties was developped 
(see Sect.nj, and that a good number of combinatorially or logically interesting 
classes of recognizable sets have been characterized algebraically (Perrin and Pin 
|M|, Carton J^). 



From the algorithmic point of view, note that passing from a Biichi au- 
tomaton to a deterministic MuUer or Rabin automaton (say, for the purpose of 
complementation) is notoriously difficult, see Safra's exponential time algorithm, 
but no significantly better algorithm is possible 85) . 

Elegant results generalize this discussion to transfinite words, that is, A- 
labeled ordinals longer than uj, see the work of Bedon, Bruyere, Carton, Choueka 
HSJ 10 13 . 




For infinite trees, we know models of automata that are equivalent to MSO- 
decidability (Rabin, see |HSl)j but the extension of the algebraic ideas sketched 
above remains to be done. The finiteness results implied by Ramsey's theory 
seem much harder to obtain for trees. 

4.2 Poset-Related Models 

A pomset (partially ordered multiset) is an A-labeled poset. The first example, 
of course, is that of words, which are A-labeled chains. Other examples were 
considered, and first of all the case of traces. 

Traces. There the alphabet is equipped with a structure - which can be viewed 
as an independence relation, or a dependence relation, or a distributed structure. 
A trace can then be viewed in several fashions: as an equivalence class of words 
in the free monoid A*, in the congruence induced by the commutation of inde- 
pendent letters (so traces form a monoid); or as a so-called dependence graph, 
that is, an A-labeled poset where the order is constrained by the distributed 
structure of the alphabet, see Diekert and Rozenberg's book f27f. The latter is 
the more significant model, from the point of view of the original motivation of 
traces as a model of distributed computation. 

The power of MSO-definability - interpreted on the dependence graph model 
- was proved to be equivalent to the power of Zielonka's automata (a model of 
automata which which incorporates information on the distributed structure of 
the alphabet), and to algebraic recognizability in the trace monoid j27| . 

Note that, as discussed in Sect. in the particular case where the letters 
are independent from one another, the trace monoid is the free commutative 
monoid. When elements of this monoid are represented by trace dependence 
graphs, where for each letter a € A, the set of A-labeled elements is a chain, 
then antichains have bounded cardinality (that of A), and a linearization of 
the poset can be defined by a MSO-formula, so MSO-definability is equivalent 
to C MSO-definability. When the elements of the same monoid are represented 
by finite discrete A-labeled graphs, without any edges, then MSO-definability is 
strictly weaker than C MSO-definability. In both cases however, recognizability 
is equivalent to C MSO-definability. 

Good results are also known for FO-definable trace languages: they are char- 
acterized by star-free rational expressions, and by the aperiodicity of their syn- 
tactic monoid (Guaiana, Restivo, Salemi 02]), and important temporal logics 
with the same expressive power have been developed (see Thiagarajan and 
Walukiewicz [H2| and Diekert and Gastin |26|). 




There is a large body of literature on recognizable trace languages, and the 
results summarized above point to a rather well understood situation. Some 
questions however are not solved in a completely satisfactory fashion. For in- 
stance, the question of rational expressions for trace languages remains unclear 
(see the star problem): the difficulty comes from the fact that the star of a rec- 
ognizable trace language may not be recognizable; the notion of concurrent star, 
which takes care of that obstacle, retains an ad hoc flavor |27) . Similarly, with 
the remarkable exception of FO-definable trace languages, the task of identify- 
ing, characterizing and deciding interesting subclasses of recognizable languages 
has eluded efforts. 

One can argue that this is due to the loss of information that occurs if we 
consider the set of traces as a monoid - which we must do if the algebraic 
structure on the set of traces is that of a monoid: in the resulting definition of 
recognizability, a set of traces is recognizable if and only if its set of linearizations 
(in A*) is recognizable. From an algebraic point of view, this puts the emphasis 
on commutation, but two traces may commute because they are independent, or 
because they are powers of a third one, in which case they are deeply dependent. 
From a more algorithmic point of view, what is done there is to reduce the study 
of a trace language to the study of the language of all its linearizations. 

On the other hand, Zielonka's automata succeed in taking into account the 
distributed structure of the computation model, and are well-adapted to traces. 
Since they match monoid recognizability all the same, this points to the following 
problem: to find an alternative algebraic structure on the set of traces, which does 
not change the family of recognizable sets, yet better accounts for the distributed 
nature of that model, and hence (hopefully) naturally connects with Zielonka's 
automata (i.e., provides an algebraic proof of Zielonka's theorem) and allows 
the identification and characterization of structurally significant subclasses of 
recognizable trace languages. 

Infinite traces exhibit interesting properties, from the point of view of auto- 
mata- recognizability and logical definability, see |27I3(J) . 

Message Sequence Charts and Communication Diagrams. Message se- 
quence charts (MSCs) form a specification language for the design of communi- 
cation protocols, that has attracted a lot of attention in the past few years. They 
can also be considered as specifications of particular pomsets, that are disjoint 
unions of k chains. An abstraction of this model is given by Lamport Diagrams 
(LDs) and by Layered Lamport Diagrams (LLDs), which are LDs subject to a 
boundedness condition. 

Henriksen, Kumar, Mukund, Sohoni, Thiagarajan |43I44I62| considered the 
class of bounded finite MSG languages, defined by so-called bounded (Alur, Yan- 
nakakis jH]) or locally synchronised (MuschoU, Peled [HSI) MSC-graphs. For 
bounded MSG languages, MSO-definability is equivalent to rationality of the 
language of all linearizations, and to recognizability by deterministic (resp. non- 
deterministic) message-passing automata. Kuske extended these results to FO- 
definable MSG languages, and to infinite bounded MSGs [SS|- 



The restriction to classes of posets with a rational language of linearizations 
is rather severe, but little work so far has discussed definability or recognizability 
outside this hypothesis. Meenakshi and Ramanujam and Peled [SJ investi- 
gated decidable logics for MSCs and LLDs, that are structural, i.e., not defined 
on the language of linearizations. There does not seem yet to exist an algebraic 
approach of (a subclass of LDs) that would match the power of MSO-definability. 

Series-Parallel Pomsets. Sets of series-parallel pomsets (or sp-languages) 
were investigated by Lodaya, Weil, Kuske |52I55| . A poset is series-parallel if 
it can be obtained from singletons by using the operations of sequential and 
parallel product. There is a combinatorial characterization of these posets (A^- 
free posets 1411871 ). but the definition above naturally leads to the consideration 
of the so-called series-parallel algebras 15 5^ , that is, sets equipped with two binary 
associative operations, one of which is commutative. Kuske showed that an sp- 
language is recognizable if and only if it is CMSO-deflnable [S^]. Lodaya and Weil 
introduced a model of branching automata and a notion of rational expressions, 
which they proved had the same expressive power ^5 . However these automata 
accept not only the recognizable sp-languages, but also some non-recognizable 
ones. 

The bounded-width condition is a natural constraint on sp-languages: a set L 
of series-parallel pomsets has bounded- width if there is a uniform upper bound on 
the cardinality of an anti-chain in the element so L. Results of Kuske, Lodaya and 
Weil ,52.55. show that when we consider only bounded-width sp-languages, then 
recognizability is equivalent to automata- recognizability, to MSO-decidability, 
and to expressibility by a so-called series-rational expression. Under the boun- 
ded-width hypothesis, FO-definable sp-languages are characterized by a notion 
of star- free rational expressions, and by an algebraic condition on the syntactic 
sp-algebra which is analogous to the aperiodicity of monoids [22] . 

Texts and n-Pomsets. An A-labeled text is a finite >l-labeled set, equipped 
with 2 linear orders. Texts form a particular class of the 2-structures studied by 
Ehrenfeucht, Engelfriet, Harju, Proskurowski and Rozenberg [31132134] . Hooge- 
boom and ten Pas introduced an algebraic structure on the set of all texts |T7] . 
This algebra has an infinite signature, but within any finitely generated sub- 
algebra (generated by A and any finite subset of the signature, the hypothesis of 
bounded primitivity in |47l48j '). recognizability is equivalent to MSO-definability. 

The class of texts generated by the alphabet and the two arity 2 operations 
on texts (alternating texts) is of particular interest, as we now discuss. 

A pair of linear orders (<i, <2) on a finite set specifies and can be specified 
by a pair of partial orders (Ci,C2) such that every pair of distinct elements 
is comparable in exactly one of these partial orders (this defines a 2-poset): it 
suffices to take <i = and <2 = E1U32; and conversely = <in<2 and 

E2 = <i n >2- Since the translation between texts and 2-posets is described by 
quantifier-free formulas, MSO-definability is preserved under this translation. On 
2-posets, one can consider two natural operations: one behaves like a sequential 



product on \Zi and a parallel product on C2; and the other is defined dually, 
exchanging the roles of the two partial orders. Let SPB{A) be the algebra of 2- 
posets generated by A and these two operations. Esik and Nemeth observed that 
these two operations on 2-posets translate to the two arity 2 operations of the 
text algebra; moreover, they introduced a simple model of automata for subsets 
of SPB{A), whose power is equivalent to recognizability and to MSO-definability 

Esik and Nemeth's automata can also be defined for n-posets, where they 
are also equivalent to recognizability and to MSO-definability. 

Pomsets in General. There does not seem to be a natural model of automaton 
that makes sense on all pomsets. However, since pomsets can be represented by 
A-labeled directed acyclic graphs (dags), they are directly concerned by the 
discussion in the next section. In particular, and getting ahead of ourselves, 
let us observe that the subsignature of the modular signature consisting of the 
operations defined by graphs that are posets (resp. dags), generates the class of 
all finite posets (resp. dags) - and the results on C MSO-definability discussed in 
Sect. 14.31 therefore apply to pomset languages. 

4.3 Graphs and Relational Structures 

Graphs (edge- or vertex- labeled, colored, with designated vertices, etc), and be- 
yond them, relational structures (i.e., hypergraphs) are the next step, and they 
occur indeed in many modeling problems. The notion of logical definability is 
rather straightforward, although it may depend on the logical structure we con- 
sider on graphs (whether a graph is a set of vertices with a binary edge predicate, 
or two sets of vertices and edges with incidence predicates). From the algebraic 
point of view, there is no prominent choice for a signature to describe graphs. 
However, three signatures emerge from the literature. One of them, the modular 
signature, arises from the theory of modular decomposition of graphs, the other 
two (the HR- and the yi?-signature) arise from the theory of graph grammars. 
We will also consider a fourth signature, on the wider class of relational struc- 
tures. We will see that under suitable finiteness conditions, the resulting notions 
of recognizability are equivalent. 

After briefly describing these signatures, and comparing the notions of rec- 
ognizability which they induce, we rapidly survey known definability results. We 
conclude with the discussion of a couple of situations where automata-theoretic 
models have been introduced. 

The Modular Signature. A concrete graph H with vertices {1, . . . ,n} and 

edge set Eh, induces an n-ary operation on graphs as follows: the vertex set of 
the graph H{Gi, . . . ,Gn) is the disjoint union of the vertex sets of the Gi, it 
contains all the edges of the Gi, and for each edge £ Eh, it also has all the 
edges from a vertex of Gi to a vertex of Gj. A graph is said to be prime if it 
cannot be decomposed non-trivially by such an operation. The modular signature 



Too consists of the set of all prime graphs, or rather, of one representative of 
each isomorphism class of prime graphs. It is an infinite signature. In particular. 
Too contains a finite number of operations of each arity. The operations of arity 
2 are the parallel product, the sequential product and the clique product: they 
are defined by the graphs with 2 vertices and no edge, 1 edge and 2 edges, 
respectively |18I88| . 

The theory of modular decomposition of graphs |5 71611 shows that the class 
of all finite graphs is generated by the singleton graph and Too, and describes the 
relations between the operations in Toa- Finite A-labeled graphs are generated 
by A and T^^. If is a finite subset of the algebra generated by A and T 
is called the class of A-labeled J-"- graphs. 

For instance, if T consists of the sequential product, the jF-graphs are the 
finite words. If T consists of the parallel (resp. clique) product, they are the 
discrete graphs (resp. cliques). If T consists of the parallel and the sequential 
products, we get the series-parallel posets (see Sect. l4.2|l . and if it consists of the 
parallel and clique products, we get the cographs (see below). 

The Signature Hit. Here, graphs are considered as sets equipped with a bi- 
nary edge predicate, and a finite number of constants (i.e., designated vertices), 
called sources. Each finite set of source names defines a sort in the 77i?-algebra 
QS of graphs with sources. The operations in the algebra are the disjoint union 
of graphs with disjoint sets of source names, the renaming of sources, forgetting 
sources, and the fusion of two sources [T^. A number of variants can be consid- 
ered, which do not affect the class of _ffi?-recognizable subsets, see Courcelle and 
Weil [21]: the disjoint union can be replaced with parallel composition (source 
name sets need not be disjoint, and sources with the same name get identified), 
the sources may be assumed to be pairwise distinct [source separated graphs) ^ the 
source renaming operations can be dropped, or the source forgetting operations, 
etc. 

The signature HR emerged from the literature on graph grammars, and the 
acronym HR stands for Hyperedge Replacement. More precisely, the equational 
sets of graphs, relative to the signature HR, are known to enjoy good closure 
properties, and can be elegantly characterized in terms of recognizable tree lan- 
guages and MS-transductions where both vertex and edge sets can be quantified 
(see Courcelle [T^'). 

The Signature VR. Now graphs are considered as sets equipped with a binary 
edge predicate, and a finite number of unary predicates (i.e., colors on the set 
of vertices), called ports. Each finite set of port names defines a sort in the VR- 
algebra QV of graphs with ports. The operations in the algebra are the disjoint 
union, the edge adding operation (adding an edge from each p-port to each ex- 
port for designated port names p, g), and the renaming and forgetting of port 
names. Again, the class of yi?-recognizable subsets is not affected by variants 
such as the consideration of graphs where ports must cover the vertex set, or 



must partition it. It also coincides with A/'£C-recognizable graphs, see Courcelle 
and Weil [HI- 

The signature VR (standing for Vertex Replacement) also emerged from the 
literature on graph grammars, and the equational sets of graphs, relative to the 
signature VR, enjoy good closure properties, and are characterized in terms of 
recognizable tree languages and MS-transductions where only vertex sets can be 
quantified (see Courcelle |i9)). 

The Signature S on Relational Structures with Sources. Subsuming 
the algebras of graphs with sources and with ports, we can consider the class 
of relational structures with sources StS. These are sets equipped with a finite 
relational structures, and a finite number of constants (sources). Each pair con- 
sisting of a relational signature and a set of source names defines a sort, and 
the operations in the signature S are the disjoint union between sorts with dis- 
joint sets of source names, and all the unary operations that can be defined 
on a given sort using quantifier- free formulas, see |21| . The operations in the 
signatures VR and HR are particular examples of such quantifier-free definable 
operations. The notion of iS-recognizability is not affected if we consider paral- 
lel composition instead of disjoint union (as for the signature HR), nor if we 
consider only structures where sources are separated (21) . 

Comparing the Notions of Recognizability. Combining a number of results 
of Courcelle and Weil [21], we find that a set of graphs is Fi?-recognizable if 
and only if it is iS-recognizable. Moreover, a T^i?-recognizable set of graphs is 
.Foo-recognizable, and it is also iJi?-recognizable (the implication for HR- and 
yi?-equational sets goes in the other direction). Finally, J-qo-, HR- and VR- 
recognizability (resp. equationality) are equivalent under certain boundedness 
conditions. 

In particular if we consider a set L of graphs without K^^n for some n (Kn^n 
is the complete bipartite directed graph with n -\- n vertices), then L is VR- 
recognizable if and only if it is ffi?-recognizable [21] ■ This sufficient condition is 
implied by the following boundedness properties (in increasingly general order): 
the graphs in L have uniformly bounded degree, they have bounded tree-width, 
they are sparse. The notion of bounded tree-width can be seen as a finite gener- 
ation property, relative to the signature HR '19'. 

If .T-" is a finite subset of the modular signature J-qo and L is a set of J-'- 
graphs, then L is JT-recognizable if and only if it is JFqo -recognizable (resp. VR- 
recognizable, iS-recognizable) |18I21| . 

Monadic Second-Order Definability. From the logical definability point of 
view, graphs can be seen as sets (of vertices) equipped with an edge predicate, 
or as pairs of sets (of vertices and edges respectively) with incidence predicates. 
Let us denote by CMSO[E] the CMSO emerging from the first point of view, and 
by CMSO[inc] the second one. It is easily verified that CMSO[E]-definable sets 



of graphs are also CMSO[inc] -definable. Moreover CMSO[E]-definability implies 
yi?-recognizability, and CMSO[inc]-dcfinability implies i/i?-recognizability, see 
Courcelle [B]. 

Lapoire showed that if L is a set of graphs with bounded tree-width, then 
CMSO[inc]-definability is equivalent to 7?i?-recognizability |^. Moreover, if L 
is uniformly sparse, then CMSO[E]- and CMSO[inc]-definability are equivalent 
(Courcelle [201 )■ In view of the equivalence result between HR- and T^i?-recog- 
nizability mentioned above, it would be interesting to find out whether both 
definabilities are also equivalent if L is without Kn,n for some n. 

Returning to the modular signature, CMSO[E]-definability implies JFoo-recog- 
nizability, by general reasons (see Sect. I3.3|l . Weil showed that the converse 
holds for sets of JF-graphs, provided J- is finite and the operations of J- enjoy 
a limited amount of commutativity {weakly rigid signature, see [HE] for details). 
This assumption is rather general, and covers in particular all the cases where 
^-graphs are dags or posets, and notably the case of sp-languages. 

A typical example of a subsignature oi Too which is not weakly rigid, consists 
of the parallel and the clique products, two binary commutative, associative 
operations which generate the cographs. Cographs form a class of undirected 
graphs, closely related with comparability graphs (Cornell, Lerchs, Stewart |23p 
and can be characterized as follows: an undirected graph is a cograph if and only 
if it does not contain P4 (P4 has vertex set {1, . . . , 5} and edges between i and 
i + l<i<4). The arguments that show that .F-recognizability is equivalent 
to CMSO[E]-definability when T is weakly rigid, fail for cographs. Courcelle [TH| 
asks whether CMSO-definability is strictly weaker than jF-recognizability for a 
general finite subsignature T C Too ■ the first place to look for a counter-example 
seems to be cographs. 

Series- i7 Algebras. In their investigation of sp-languages, Lodaya and Weil in- 
troduced series- Z'-algebras and their subsets (sX'-languages): here is a ranked 
alphabet (as in the study of tree languages, Sect.EJ and • is a binary associative 
operation not in S. The sS-terms, that is, the elements of the algebra freely 
generated by A, S and • (called the free series-S -algebra over A), can be viewed 
as finite sequences of i7-trees, where each child of the root is in fact a smaller 
sS-teirtY. Lodaya and Weil introduced a model of automata and a notion of ra- 
tional expressions, both of which are equivalent to recognizability 56 - a result 
which generalizes the characterization of recognizability by finite automata for 
both words and trees. They showed that their result could be adapted if some of 
the operations in S were assumed to be commutative, but not if some amount of 
associativity was introduced (e.g. sp-languages, cographs). The logical dimension 
of sZ'-languages was not developped. 

Automata for Graph Languages. We have seen some automata, designed for 
specific situations (finite and infinite words, traces, series-parallel pomsets, MSC 
languages, n-pomsets), see Sect. 14.21 As discussed there, these automata mod- 



els match the expressiveness of recognizabihty and MSO-definabihty, sometimes 
under additional boundedness hypothesis. 

For general graph languages, Thomas introduced the notion of graph accep- 
tor ,i84,i86] ■ generalizing the tiling systems introduced earlier by Giammarresi 
and Restivo 001 for pictures (A- labeled rectangular grids). Recognizability by a 
graph acceptor was shown to be equivalent to E'MSO-definability, where E'MSO 
is the extension of FO by existential quantification of monadic second order 
variables. 

4.4 Revisiting Trees 

As mentioned in Sect.|21 the problem of deciding whether a given recognizable set 
of Z'-trees is FO-definable is still open, and various attempts to use the structure 
of iT-algebras described in Sect.|21to solve it in the spirit of Schiitzenberger's 
theorem (Theorem II. 2f) . have failed |46I71I72| . Recently, Esik and Weil intro- 
duced a new algebraic framework to investigate this particular problem on tree 
languages |36| . The point was to enrich the algebraic framework, without mod- 
ifying the notion of a recognizable subset, but introducing additional algebraic 
structure. 

Esik and Weil's algebras, called preclones, are multi-sorted algebras with 
one sort for each integer n, and Z'-trees form the 0-sort of the free Z'-generated 
preclone. As indicated, a set of S-tiees is preclone-recognizable if and only if it is 
recognizable with respect to i7-algebras; moreover, if L is a recognizable I7-tree 
language, its syntactic preclone (which is finitary but not finite due to the infinite 
number of sorts) admits a finite presentation, encoded in the minimal bottom-up 
automaton of L. This is naturally important if we want to use syntactic preclones 
in algorithms. 

The main result of 2S| states that L is FO-dcfinable if and only if its syntactic 
preclone lies in the least pseudovariety of preclones closed under two-sided wreath 
products, and containing a certain very simple 1-generated preclone. The two- 
sided wreath product is a generalization of the operation of the same name on 
monoids (Rhodes, Tilson, see [77]), and this result generalizes Schiitzenberger's 
theorem on finite words. It is the first algebraic characterization of FO-definable 
tree languages, but unfortunately, it is not clear at this point whether this char- 
acterization can be used to derive a decision algorithm. 

The approach in also applies to FO+MOD-definable tree languages, and 
other similarly defined languages. 

4.5 Timed Models 

Timed automata appeared in the 1990s, to represent the behavior of finite state 
systems subjected to explicit time constraints (Alur, Dill 0). While they are 
already widely used, the foundations of the corresponding theory are still under 
development. There are several variants of these automata, such as event-clock 
automata, and of the models of timed computations (timed words, clock words, 
etc). There have also been several attempts to develop appropriate notions of 



rational expressions, that would be equivalent to the expressive power of timed 
automata, see Henzinger, Raskin, Schobbens Asarin, Maler, Caspi 4 , Dima 
|28|. Maler, Pnueli 55" among others. At the same time, timed automata and 
timed languages may exhibit paradoxical behaviors, due to the continuous nature 
of time, so the central ideas and techniques from the classical theory cannot 
simply be enriched with timed constraints to account for the behavior of timed 
automata. 

The development of an algebraic apparatus and of a logical formalism is also 
still in its infancy. One should mention however the recent work of Maler and 
Pnueli |i58j, and the results of Francez and Kaminski (38j and Bouyer, Petit and 
Therien on generalizations of timed languages and automata, to automata on 
infinite alphabets and to data languages, respectively. In both cases, an interest- 
ing notion of algebraic recognizability is introduced, that is at least as powerful 
as timed automata, languages. Moreover, several logics have been introduced 
(see for instance Demri, D'Souza "57), but none is completely satisfactory with 
respect to the motivation of formulating and solving the controller synthesis 
problem, and none is connected in a robust way to an algebraic approach of 
recognizability. 

The development of a complete theory of timed systems, incorporating aut- 
omata-theoretic, algebraic and logical aspects, appears to be one of the more 
difficult challenges of the moment. 
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